Dialog speech acts and prosody: Considerations for TTS

نویسندگان

  • Ann K Syrdal
  • Yeon-Jun Kim
چکیده

As natural language dialog systems involving both speech recognition and text-to-speech (TTS) synthesis become more sophisticated, the limitations of general-purpose TTS for human-computer dialogs have become more apparent. Much subtlety and complexity of meaning in natural language dialogs is conveyed by prosody; how something is said is often as important as what words are spoken. At the same time, advances such as unit selection synthesis have greatly improved the naturalness of synthetic speech because much less signal processing is required, resulting in less distortion. However, the improved naturalness provided by unit selection synthesis has been achieved at the cost of the more precise prosodic control provided by earlier, more robotic sounding synthesizers. With the goal of providing more prosodic and expressive control over unit selection TTS for dialog applications, while retaining naturalness, we have focused on speech acts, the communicative function of an utterance. The current working set of speech acts being used includes: • Imperative: directive, request, wait, repeat, warning • Interrogative: question-wh, question-yes/no, questionmultiple choice • Assertive: informative-general, informative-detail • Affective: apology, exclamation-positive, exclamationnegative, greeting, good-bye, thanks • Others: confirmation, disconfirmation, back-channel, cue phrase Our work is practically focused, but also involves some observations of more general interest. We use a relatively small set of speech acts both to classify utterances in a speech corpus according to their communicative function, and then to preferentially select speech act-appropriate units to match the desired speech act of the utterance to be synthesized. The corpus is composed of speech read (primarily from interactive dialogs of various kinds) by a female US English speaker (a voice talent used to build one of our TTS voices). We examine prosodic differences of a more “global” nature (mean f0, f0 range, speaking rate, energy level) for the entire set of speech acts. A portion of the database has also been ToBI labeled and analyzed for systematic differences. There are several significant prosodic differences among the various speech acts. In our current TTS implementation, speech acts are being used as another feature to select speech units for concatenation, but results from analyzing prosodic features of the various speech acts will also be used to better predict the prosodic features desired. Results thus far are promising and examples will be demonstrated.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech acts and dialog TTS

The approach outlined in this paper aims to provide better expressivity of unit selection TTS for dialog intended applications while retaining the natural sounding voice quality typical of unit selection synthesis. A small set of speech acts were used to annotate a corpus from one female US English speaker. The corpus was composed of speech read primarily from interactive dialogs of various kin...

متن کامل

A perceptual study for modelling speaker-dependent intonation in TTS and dialog systems

In general, most of the developed prosody and intonation models were obtained from a statistical analysis of F0 curves and resynthesis by TTS. But there is yet another chance improving quality and naturalness: effective results can also be obtained by analysing the listeners’ common sense about natural intonational behavior. Therefore, we use a digital process that generates signals representin...

متن کامل

Enriching Text-to-Speech Synthesis Using Automatic Dialog Act Tags

We present an approach for enriching dialog based textto-speech (TTS) synthesis systems by explicitly controlling the expressiveness through the use of dialog act tags. The dialog act tags in our framework are automatically obtained by training a maximum entropy classifier on the Switchboard-DAMSL data set, unrelated to the TTS database. We compare the voice quality produced by exploiting autom...

متن کامل

Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech?

Identifying whether an utterance is a statement, question, greeting, and so forth is integral to effective automatic understanding of natural dialog. Little is known, however, about how such dialog acts (DAs) can be automatically classified in truly natural conversation. This study asks whether current approaches, which use mainly word information, could be improved by adding prosodic informati...

متن کامل

Enriching Spoken Language Translation with Dialog Acts

Current statistical speech translation approaches predominantly rely on just text transcripts and do not adequately utilize the rich contextual information such as conveyed through prosody and discourse function. In this paper, we explore the role of context characterized through dialog acts (DAs) in statistical translation. We demonstrate the integration of the dialog acts in a phrase-based st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009